AITopics | successor state

Deep Q-Networks (DQNs) estimate future returns by learning from transitions sampled from a replay buffer. However, the target updates in DQN often rely on next states generated by actions from past, potentially suboptimal, policy. As a result, these states may not provide informative learning signals, causing high variance into the update process. This issue is exacerbated when the sampled transitions are poorly aligned with the agent's current policy. To address this limitation, we propose the Successor-state Aggregation Deep Q-Network (SADQ), which explicitly models environment dynamics using a stochastic transition model. SADQ integrates successor-state distributions into the Q-value estimation process, enabling more stable and policy-aligned value updates. Additionally, it explores a more efficient action selection strategy with the modeled transition structure. We provide theoretical guarantees that SADQ maintains unbiased value estimates while reducing training variance. Our extensive empirical results across standard RL benchmarks and real-world vector-based control tasks demonstrate that SADQ consistently outperforms DQN variants in both stability and learning efficiency.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2511.03836

Genre: Research Report > New Finding (0.46)

Industry: Transportation (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Inference of Deterministic Finite Automata via Q-Learning

Hosseinkhani, Elaheh, Leucker, Martin

arXiv.org Artificial IntelligenceOct-21-2025

Traditional approaches to inference of deterministic finite-state automata (DFA) stem from symbolic AI, including both active learning methods (e.g., Angluin's L* algorithm and its variants) and passive techniques (e.g., Biermann and Feldman's method, RPNI). Meanwhile, sub-symbolic AI, particularly machine learning, offers alternative paradigms for learning from data, such as supervised, unsupervised, and reinforcement learning (RL). This paper investigates the use of Q-learning, a well-known reinforcement learning algorithm, for the passive inference of deterministic finite automata. It builds on the core insight that the learned Q-function, which maps state-action pairs to rewards, can be reinterpreted as the transition function of a DFA over a finite domain. This provides a novel bridge between sub-symbolic learning and symbolic representations. The paper demonstrates how Q-learning can be adapted for automaton inference and provides an evaluation on several examples.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2510.17386

Country: North America > United States (0.93)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

fa080fe0f218871faec1d8ba20e491d5-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 22:04:50 GMT

append, successor, successor state, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California > Santa Clara County > San Jose (0.04)

Genre: Research Report > Experimental Study (0.92)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

One Risk to Rule Them All: A Risk-Sensitive Perspective on Model-Based Offline Reinforcement Learning

Neural Information Processing SystemsOct-9-2025, 11:48:13 GMT

Offline reinforcement learning (RL) is suitable for safety-critical domains where online exploration is not feasible.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Netherlands > South Holland > Delft (0.04)

Genre: Research Report (0.46)

Industry:

Health & Medicine > Therapeutic Area > Immunology (0.96)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.71)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.93)

Add feedback

Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning

Neural Information Processing SystemsOct-2-2025, 03:01:38 GMT

Recently, there has been significant progress in understanding reinforcement learning in discounted infinite-horizon Markov decision processes (MDPs) by deriving tight sample complexity bounds. However, in many real-world applications, an interactive learning agent operates for a fixed or bounded period of time, for example tutoring students for exams or handling customer service requests. Such scenarios can often be better treated as episodic fixed-horizon MDPs, for which only looser bounds on the sample complexity exist. A natural notion of sample complexity in this setting is the number of episodes required to guarantee a certain performance with high probability (P AC guarantee).

probability, reinforcement learning, sample complexity, (12 more...)

Neural Information Processing Systems

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Technology: